115 research outputs found

    Specification and Simulation of Statistical Query Algorithms for Efficiency and Noise Tolerance

    Get PDF
    AbstractA recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the probably approximately correct (PAC) model, both in the absenceandin the presence of noise. However, simulations of SQ algorithms in the PAC model have non-optimal time and sample complexities. In this paper, we introduce a new method for specifying statistical query algorithms based on a type ofrelative errorand provide simulations in the noise-free and noise-tolerant PAC models which yield more efficient algorithms. Requests for estimates of statistics in this new model take the following form: “Return an estimate of the statistic within a 1±μfactor, or return ⊥, promising that the statistic is less thanθ.” In addition to showing that this is a very natural language for specifying learning algorithms, we also show that this new specification is polynomially equivalent to standard SQ, and thus, known learnability and hardness results for statistical query learning are preserved. We then give highly efficient PAC simulations of relative error SQ algorithms. We show that the learning algorithms obtained by simulating efficient relative error SQ algorithms both in the absence of noise and in the presence of malicious noise have roughly optimal sample complexity. We also show that the simulation of efficient relative error SQ algorithms in the presence of classification noise yields learning algorithms at least as efficient as those obtained through standard methods, and in some cases improved, roughly optimal results are achieved. The sample complexities for all of these simulations are based on thedνmetric, which is a type of relative error metric useful for quantities which are small or even zero. We show that uniform convergence with respect to thedνmetric yields “uniform convergence” with respect to (μ, θ) accuracy. Finally, while we show that manyspecificlearning algorithms can be written as highly efficient relative error SQ algorithms, we also show, in fact, thatallSQ algorithms can be written efficiently by proving general upper bounds on the complexity of (μ, θ) queries as a function of the accuracy parameterε. As a consequence of this result, we give general upper bounds on the complexity of learning algorithms achieved through the use of relative error SQ algorithms and the simulations described above

    General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting

    Get PDF
    AbstractWe derive general bounds on the complexity of learning in the statistical query (SQ) model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the SQ model. This new model was introduced by Kearns to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Since all SQ algorithms can be simulated in the PAC model with classification noise, we also obtain general upper bounds on learning in the presence of classification noise for classes which can be learned in the SQ model

    Automatic Ground Truth Expansion for Timeline Evaluation

    Get PDF
    The development of automatic systems that can produce timeline summaries by filtering high-volume streams of text documents, retaining only those that are relevant to a particular information need (e.g. topic or event), remains a very challenging task. To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. tweets) to an explicit representation of what information a 'good' summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such labels fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which timeline summary ground truth labels fail to generalize to new summarization systems, then we propose and evaluate new automatic solutions to this issue. In particular, using a depooling methodology over 21 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being miss-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of miss-ranking systems, we also propose two different automatic ground truth label expansion techniques. Our results show that our proposed expansion techniques can be effective for increasing the robustness of the TREC-TS test collections, markedly reducing the number of miss-rankings by up to 50% on average among the scenarios tested

    On Estimating the Size and Confidence of a Statistical Audit

    Get PDF
    We consider the problem of statistical sampling for auditing elections, and we develop a remarkably simple and easily-calculated upper bound for the sample size necessary for determining with probability at least c whether a given set of n objects contains b or more “bad” objects. While the size of the optimal sample drawn without replacement can be determined with a computer program, our goal is to derive a highly accurate and simple formula that can be used by election officials equipped with only a simple calculator

    Bayes Optimal Metasearch: A Probabilistic Model for Combining the Results of Multiple Retrieval Systems

    Get PDF
    We introduce a new, probabilistic model for combining the outputs of an arbitrary number of query retrieval systems. By gathering simple statistics on the average performance of a given set of query retrieval systems, we construct a Bayes optimal mechanism for combining the outputs of these systems. Our construction yields a metasearch strategy whose empirical performance nearly always exceeds the performance of any of the constituent systems. Our construction is also robust in the sense that if ``good\u27\u27 and ``bad\u27\u27 systems are combined, the performance of the composite is still on par with, or exceeds, that of the best constituent system. Finally, our model and theory provide theoretical and empirical avenues for the improvement of this metasearch strategy

    Efficiency of Treated Domestic Wastewater to Irrigate Two Rice Cultivars, PK 386 and Basmati 515, under a Hydroponic Culture System

    Get PDF
    The increasing human population continues to exert pressure on the freshwater scarcity. The availability of freshwater for crop irrigation has become challenging. The present study aimed to use domestic wastewater (DWW) for the irrigation of two rice cultivars (CVs) after treatment with the bacterial strain Alcaligenes faecalis MT477813 under a hydroponic culture system. The first part of this study focused on the bioremediation and analysis of the physicochemical parameters of DWW to compare pollutants before and after treatment. The biotreatment of DWW with the bacterial isolate showed more than 90% decolourisation, along with a reduction in contaminants. The next part of the study evaluated the impacts of treated and untreated DWW on the growth of two rice cultivars, i.e., PK 386 and Basmati 515, under a hydroponic culture system which provided nutrients and water to plants with equal and higher yields compared to soil. Growth parameters such as the shoot and root length and the wet and dry weights of the rice plants grown in the treated DWW were considerably higher than those for the plants grown in untreated DWW. Therefore, enhanced growth of both rice cultivars grown in biotreated DWW was observed. These results demonstrate the bioremediation efficiency of the bacterial isolate and the utility of the DWW for rice crop irrigation subsequent to biotreatment

    Disposition Kinetics and Optimal Dosage of Ciprofloxacin in Healthy Domestic Ruminant Species

    Full text link
    The purpose of this experimental study was to determine the disposition kinetics and optimal dosages of ciprofloxacin in healthy domestic ruminant species including adult female buffalo, cow, sheep and goat. The drug was given as a single intramuscular dose of 5 mg/kg. The plasma concentrations of the drug were determined with HPLC and pharmacokinetic variables were determined. The biological half-life (t1/2 β was longer in cows (3.25 ± 0.46 h) followed by intermediate values in buffaloes (3.05 ± 0.20 h) and sheep (2.93 ± 0.45 h) and shorter in goats (2.62 ± 0.39 h). The volume of distribution (Vd) in buffaloes was 1.09 ± 0.06 l/kg, cows 1.24 ± 0.16 l/kg, sheep 2.89 ± 0.30 l/kg and goats 3.76 ± 0.92 l/kg. Total body clearance (ClB) expressed in l/h/kg was minimum in buffaloes 0.25 ± 0.02 followed by values in cows 0.31 ± 0.02 and sheep 0.75 ± 0.04 and maximum in goats 1.09 ± 0.11. An optimal dosage regimen for 12-h interval consisted of 5.17, 5.62, 6.54 and 6.10 mg/kg body weight as priming and 4.84, 5.37, 6.26 and 5.91 mg/kg body weight as maintenance intramuscular dose in buffalo, cow, sheep and goat, respectively. The manufacturers of ciprofloxacin have claimed 5 mg/kg dose to be repeated after 24 h. However, the investigated dosage regimen may be repeated after 12 h to maintain MIC at the end of the dosage interval. Therefore, it is imperative that an optimal dosage regimen be based on the disposition kinetics data determined in the species and environment in which a drug is to be employed clinically

    Clinical practice guidelines on the management of variceal bleeding

    Get PDF
    Gastroesophageal variceal bleeding occurs in 30 - 50% of patients of liver cirrhosis with portal hypertension, with 20-70% mortality in one year. Therefore, it is essential to screen these patients for varices and prevent first episode of bleeding by treating them with β-blockers or endoscopic variceal band ligation. Ideally, the patients with variceal bleeding should be treated in a unit where the personnel are familiar with the management of such patients and where routine therapeutic interventions can be undertaken. Proper management of such patients include: initial assessment, resuscitation, blood volume replacement, vasoactive agents, prevention of associated complications such as bacterial infections, hepatic encephalopathy, coagulopathy and thrombocytopenia, and specific therapy. Rebleeding occurs in about 60% patients within 2 years of their recovery from first variceal bleeding episode, with 33% mortality. Therefore, it is mandatory that all such patients must be started on combination of β-blockers and band ligation to prevent recurrence of bleeding. Patients who required shunt surgery/TIPSS to control the acute episode do not require further preventive measures. These clinical practice guidelines (CPGs) have been jointly developed by Pakistan Society of Hepatology (PSH) and Pakistan Society of Study of Liver Diseases (PSSLD)
    corecore